统计数据会说谎 您所在的位置:网站首页 统计数据会说谎pdf 下载 统计数据会说谎

统计数据会说谎

2023-12-08 09:33| 来源: 网络整理| 查看: 265

Mark Twain once said that “Facts are stubborn things, but statistics are pliable.”

马克·吐温曾经说过: “事实是固执的东西,但统计数字却很柔和。”

In the science of statistics the truth is in the analysis and this truth is fluid since there are scores of techniques available to cut and chart the information! But not all investigations are created equal. The study of statistics is rich with opportunities for analysts to map data and occasionally misrepresent it. With all the information streaming onto social media and the news every minute it is essential to recognize the trickery from fraud or incompetence.

在统计科学中,真理在分析中,而真理是流动的,因为有许多技术可以用来切割和绘制信息! 但是,并非所有调查都是平等的。 统计研究为分析人员提供了映射数据并偶尔歪曲数据的机会。 所有信息每分钟都会流到社交媒体和新闻上,因此必须认识到欺诈或无能为力的欺骗行为。

For example, this is a graph of Coronavirus deaths in Minnesota for the first 150 days of the pandemic using data provided by the Minnesota Department of Health. The top graph counts the deaths per day. The lower graph sums all of the Coronavirus deaths for the first 150 days of the pandemic.

例如,这是使用明尼苏达州卫生部提供的数据绘制的大流行前150天明尼苏达州冠状病毒死亡的图表。 最上面的图表计算每天的死亡人数。 下方的图表汇总了大流行前150天的所有冠状病毒死亡人数。

Image for post Minnesota Coronavirus Deaths 明尼苏达州冠状病毒死亡

At first glance the Coronavirus death statistics are only random and subsiding noise. Yet when summed over time the data clearly shows a flattened curve. However there are so many more options available to spin or graph the data.

乍看之下,冠状病毒的死亡统计数据只是随机的和消沉的噪声。 然而,随着时间的流逝,数据显然显示出平坦的曲线。 但是,还有许多其他选项可用于旋转或图形化数据。

Sample Bias

样本偏差

Image for post Sample Bias in Coronavirus deaths 冠状病毒死亡的样本偏倚

The first option to spin the data is Sample Bias. This bias exists on the account of the researcher cherry-picking information to support a half-truth. In this example I have excluded fifty days of death data. Student News Daily recently reviewed the New York Times and found that it excluded New York from its chart on the growth of new positive cases for the Coronavirus. But not all bias is the result of half-truths. For example, news stations occasionally like to run polls with their audience on a variety of topics. However these polls are inclined to be flawed for the reason that they are a voluntary response from the audience. The poll results can be further skewed by undercoverage and even the choice of wording in the question.

旋转数据的第一个选项是“采样偏差”。 这种偏见的存在是由于研究人员从樱桃中获取信息以支持半真相。 在此示例中,我排除了五十天的死亡数据。 《学生新闻日报》最近查看了《纽约时报》,发现它把纽约从冠状病毒新阳性病例增长的图表中排除了。 但是,并非所有偏见都是半真相的结果。 例如,新闻台偶尔喜欢与听众就各种主题进行民意测验。 但是,由于民意测验是观众的自愿回应,因此这些民意测验倾向于存在缺陷。 民意调查的覆盖率甚至问题的措词选择可能会进一步歪曲民意调查结果。

2. Wrong Scales

2.错误的天平

Image for post Wrong scale of deaths in log 日志中的死亡人数比例错误

Scaling bias is another tool of the half-truth researcher. This bias is achieved by simply rescaling the graph. In this illustration I changed the earlier linear scale on the left axis into a much more alarming logarithmic scale. The deception is still obvious by the self-evident non-zero start of the scale of the left axis. News organizations can use this method to obscure the truth in plain sight knowing that their audience rarely read the scales or they could exclude the scale altogether.

比例偏差是半真研究者的另一种工具。 这种偏差可以通过简单地缩放图形来实现。 在此图例中,我将左轴上的较早线性标度更改为更令人震惊的对数标度。 通过不言而喻的左轴刻度开始,欺骗仍然很明显。 新闻机构可以使用这种方法来掩盖真相,因为他们知道听众很少阅读音阶,也可以完全排除音阶。

Logarithmic scales still find legit applications in the financial markets for the representation of percentages. Log scales are attractive for the reason that analysts can conveniently represent the 100% increase or decrease in the price of a stock or commodity.

对数标度仍在金融市场中找到合法的百分比表示形式。 对数刻度具有吸引力,因为分析师可以方便地表示股票或商品价格100%的涨跌。

3. Statistical Non-Correlation

3.统计非相关

Image for post Statistical Non-Correlation of deaths 死亡统计不相关

In the study of statistics it is the task of the researcher to show that a variable X correlates with variable Y. This variable Y should increase with variable X or it should decrease with X. Likewise, the variable Y could decrease while X increases. There is also the possibility that the variables X and Y are dependent on a third variable Z. For example, X and Y could be the heat and pressure that is dependent on burn rate Z. The final option is that the variables are totally unrelated.

在统计研究中,研究人员的任务是证明变量X与变量Y相关。该变量Y应随变量X增加或随变量X减少。同样,变量Y随X增加而减少。 变量X和Y也可能依赖于第三变量Z。例如,X和Y可能是取决于燃烧率Z的热量和压力。最后的选择是变量完全不相关。

In the above chart the number of Coronavirus deaths per day in Minnesota is displayed on the left next to the Intensive Care Unit (ICU) hospitalizations on the center right. Likewise for clarity, the average number of deaths on the center left is compared to the average number of ICU hospitalizations on the right. On the surface the ICU hospitalizations may appear correlated with the Coronavirus deaths and to the untrained observer they are. This is a bogus correlation.

在上面的图表中,明尼苏达州每天发生的冠状病毒死亡人数显示在中间中间的重症监护病房(ICU)住院病人的左侧。 同样,为了清楚起见,将左侧中间的平均死亡人数与右侧的ICU住院平均人数进行比较。 从表面上看,ICU的住院治疗可能与冠状病毒的死亡以及未经培训的观察员有关。 这是假的相关性。

The problem with statistics is that it’s been carefully crafted over the centuries to find correlations in information; where sometimes this correlation is fictional. Besides false correlations, mathematicians further have to contend with the illusions of Simpson’s Paradox. Simpson’s Paradox trends can appear in an individual set of data but vanish when it is combined with another set of data. For example, Berkeley University was once accused of favoring male applicants over females. Yet when they analyzed the admission statistics for specific disciplines women did better than the men.

统计数据的问题在于,经过几个世纪的精心设计,才能找到信息的相关性。 有时这种关联是虚构的。 除了错误的相关性之外,数学家还必须应对辛普森悖论的幻想。 辛普森(Simpson)的悖论趋势可以出现在单独的一组数据中,但是当与另一组数据组合时就会消失。 例如,伯克利大学曾经被指控偏爱男性申请者而不是女性。 然而,当他们分析特定学科的录取统计数据时,女性的表现要好于男性。

My deception is obvious with the addition of trend lines in the subsequent graph.

在下一张图中添加趋势线后,我的欺骗显而易见。

Image for post Adjusted Correlation of deaths 调整后的死亡相关性

It is these interpretations that tear open the discipline to incompetence, fraud and the general undermining of the scientific process.

正是这些解释使学科无能为力,欺诈和科学过程的全面破坏。

Outsmarting The Fake News

超越假新闻

In the age of fake news it is easy to be manipulated, but you don’t have to be a fool. You just need to ask yourself three questions anytime you see a chart.

在虚假新闻时代,很容易被人为操纵,但您不必傻瓜。 看到图表时,您只需要问自己三个问题。

Is there a bias in the sampling of the information?

信息采样是否有偏差? Is there another variable that is affecting the data?

是否有另一个变量会影响数据? Could additional research contradict this finding?

其他研究是否可能与这一发现相矛盾? Is the data being overly generalized?

数据是否过于笼统?


【本文地址】

公司简介

联系我们

今日新闻

    推荐新闻

    专题文章
      CopyRight 2018-2019 实验室设备网 版权所有